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Summary 

Quantum mechanics permits certain kinds of non-local effects. This paper demonstrates how these can be used 
for distributed computation with minimal communication between various processors. The problem considered is that 
of estimating the mean of N items to a precision e, i.e. if there are N numbers that lie in the range -1 to 1, the esti- 
mate, \i e , should be such that, with a large probability, the actual mean of the N numbers lies in the range |A g ± 0(e) . 

(i) Any classical algorithm for this will need at least Cl( J samples. By using quantum mechanical effects this 



paper first presents a serial algorithm that requires only 0\ — — - time steps. 

v ; 

(ii) Using a quantum mechanical system consisting of r) coupled EPR particles, the computation time in (i) above, 
can be reduced by a factor of 0(r\) . The r| coupled EPR particles can be remotely distributed and work inde- 
pendently. Each of them has just to transmit one bit of classical information to a central location. 

0.0 Background Paradoxical effects with coupled particles are well known in quantum mechanics. There are 
three well known examples of this. The first is the original EPR paradox. This was proposed by Einstein (a famous 
skeptic of quantum mechanics) and others in 1935 and was meant to deal a deathblow to quantum mechanics. In this, 
two particles that were once in close proximity and then separated, somehow stay coupled. Operations carried out on 
one particle instantaneously affect the other particle. This seemed to violate the constraint that physical influences 
cannot propagate faster than the speed of light. However, it was soon realized that this could not be used to transmit 
any information and thus did not violate any physical law. 

Also in 1935, Schrodinger devised the paradox of the Schrodinger cat. In this, a macroscopic object (a cat) is in a 
superposition of two states, one in which it is dead and the other in which it is alive, and the question is regarding 
what really happens when it collapses into a well defined state. The results of this paradox though puzzling did not 
violate any physical law. This paper uses a Schrodinger cat state, i.e. a large system in a superposition of two states. 
The computational power of the system is shown to be proportional to the number of particles in the system. 

The third paradoxical effect of coupled particle states is quantum teleportation. In this a quantum state can be 
transmitted between two parties that share an EPR pair, merely by transmitting two bits of classical information and 
carrying out some processing at each end. This too, though seemingly paradoxical, was after some analysis, seen not 
to violate any physical law since the speed at which any information is transmitted is limited by the time it takes for 
the classical information to get through. 

0.1 This paper The reason for these paradoxes is that quantum mechanics permits non-local effects. This paper 
uses such non-local effects to carry out quantum computing when the processors are at remote locations. A group of 
r) particles is placed in just two composite states. This is called a cat state by analogy with Schrodinger's cat, i.e a 
system of many particles that is in two possible states. These r| particles are separated to different locations; each 
particle can then independently carry out its computations and convey the necessary information to the base station, 
just by transmitting one bit of classical information. The overall computation time is faster by a factor of <9(r|) . 

The other contribution of this paper is a novel quantum mechanical algorithm for an important computer science 
problem which is faster than any classical algorithm. This is then adapted to distributed computation. The computer 
science problem is that of estimating the mean of N items to a precision e, i.e. if there are N numbers that lie in the 
range -1 to 1, the estimate |i e should be such that, with a large probability, the actual mean of the N numbers lies in 
the range \i g ± 0(e) . In case V random samples are picked classically, it follows from the central limit theorem that 
the mean of the v samples will be a gaussian distribution centered about the actual mean with a standard deviation of 

of-— 1 . Therefore, with a probability approaching unity, the estimated mean lies within 0\ -p ] of the true mean. In 
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order for the uncertainty due to this variation to become less than e , 0\ ] must be smaller than e , i.e. v must be 
greater than D\ 4; ] • Therefore Q\ 4; \ samples are needed to estimate the mean with a precision of 8. Quantum 
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mechanical algorithms can simultaneously explore multiple states, it is not clear what (if any) the lower bound for 
mechanical steps. 



estimating the mean is. This paper gives an algorithm for estimating the mean in o( — y-^ ] elementary quantum 



1. Quantum mechanical algorithms Quantum computation and quantum teleportation are two excit- 
ing new developments in quantum physics. Quantum computation allows an exponential amount of computing to 
simultaneously take place in a single piece of hardware. Quantum teleportation allows the transmission of a complex 
state with arbitrary precision (provided the proper initial conditions are set up) just by transmitting a few bits of clas- 
sical information. This paper shows that significant quantum computation can take place with the processors being 
remotely located with each processor needing to transmit at most one bit of classical information. 

A good starting point to think of quantum mechanical algorithms is probabilistic algorithms [Model93] (e.g. 
simulated annealing). In these algorithms, instead of the system being in a specified state, it is in a distribution over 
various states with a certain probability of being in each state. At each step, there is a certain probability of making a 
transition from one state to another. The evolution of the system is obtained by premultiplying this probability vector 
(that describes the distribution of probabilities over various states) by a state transition matrix. Knowing the initial 
distribution and the state transition matrix, it is possible, in principle, to calculate the distribution at any later time. 

Just like classical probabilistic algorithms, quantum mechanical algorithms work with a probability distribu- 
tion over various states. However, unlike classical systems, the probability vector does not completely describe the 
system. In order to completely describe the system, we need the amplitude in each state which is a complex number. 
The evolution of the system is obtained by premultiplying this amplitude vector (that describes the distribution of 
amplitudes over various states) by a transition matrix, the entries of which are complex in general. The probabilities 
in any state are given by the square of the absolute values of the amplitude in that state. It can be shown that in order 
to conserve probabilities, the state transition matrix has to be unitary [Model93]. 

The machinery of quantum mechanical algorithms is illustrated by discussing two operations that are needed 
in the algorithm of this paper. The first is the creation of a configuration in which the amplitude of the system being in 

any of the 2" basic states of the system is equal, this is generalized to give the Walsh-Hadamard (WH) transformation; 
the second is the selective rotation of the phase of different states. 

A basic operation in quantum computing is that of a "fair coin flip" performed on a single bit whose states 



are and 1 [Search96]. This operation is represented by the following matrix: M = 

J2 



1 1 
1 -1 



A bit in the state is 



transformed into a superposition in the two states: \-^=, -7= ] . Similarly a bit in the state 1 is transformed into 

— , — \z \ , i.e. the magnitude of the amplitude in each state is but the phase of the amplitude in the state 1 is 

inverted. The phase does not have an analog in classical probabilistic algorithms. It comes about in quantum mechan- 
ics since the amplitudes are in general complex. This results in interference of different possibilities as in wave 
mechanics and is what distinguishes quantum mechanical systems from classical probabilistic systems. 

In a system in which the states are described by n bits (the system has 2" possible states) we can perform the 
transformation M on each bit independently in sequence thus changing the state of the system. The state transition 

11 n 

matrix representing this operation will be of dimension 2 x 2 .In case the initial configuration was the configura- 
tion with all n bits in the state, the resultant configuration will have an identical amplitude in each of the 2" states. 
This is a way of creating a distribution with the same amplitude in all 2" states. 
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Next consider the case when the starting state is another one of the 2 n states, i.e. a state described by an n bit 
binary string with some Os and some Is. The result of performing the transformation M on each bit will be a superpo- 
sition of states described by all possible n bit binary strings with amplitude of each state having an equal magnitude 



and sign either + or -. To deduce the sign, observe from the definition of M, i.e. M = — 



1 1 
1 -1 



, that the sign of the 



resulting state is changed only when a bit that was initially a 1 remains a 1 after the transformation. Hence if x be the 
w-bit binary string describing the starting state and y the n-bit binary string describing the resulting string, the sign of 

_ _ _ x ■ y 

the amplitude of y is determined by the parity of the bitwise dot product of x and y , i.e. (-1 ) . This transfor- 
mation is referred to as the Walsh-Hadamard (WH) transformation [Transform92] . It is one of the features that makes 
quantum mechanical algorithms more powerful than classical algorithms and, either this or a closely related trans- 
form called the Fourier Transform, forms the basis for most significant quantum mechanical algorithms. 

The other transformation that we need is the selective rotation of the phase of the amplitude in certain states. 



The transformation matrix describing this for a 2 state system is of the form: 











where ;' = J-\ and 4* i» 4*2 



are arbitrary real numbers. Unlike the WH transformation and other state transition matrices, the probability in each 
state stays the same since the square of the absolute value of the amplitude in each state stays the same. In the follow- 



ing algorithm, the application of this phase rotation transform will be written as: the phase of the f h state is rotated by 
§j radians. This conditional phase shift is the most difficult to implement, it requires two qubits and is responsible 

for quantum entanglement in the system - this is what leads to non-local effects. 



2.0 TeleCOmputatiOll Distributed computing is the problem where multiple processors are remotely distrib- 
uted and there is a cost associated with communication. The issue in distributed computing is how to optimally divide 
the problem among the various processors. The algorithms are, in general, different depending on the specific nuance 
of the problem, e.g. how severe is the cost of communication as compared to computing, how much of the data does 
each processor have access to, etc. [Qntcmm97] has very recently (independently) shown that quantum entanglement 
can indeed aid in multi-party communication. This paper shows that a certain important computer science problem 
can be solved efficiently with distributed quantum processors. The problem is that of finding the average of N real 
numbers in the range [-1,1] to a specified precision. The following is the organization of the rest of the paper: 

(i) Sections 2.1 & 2.2 consider the problem where the number of processors exactly equal the number of pieces of 
data. Each processor has access to just one piece of data. The processors can all be at different locations and 
carry out their computations independently. At the end, each processor has just to transmit one bit of classical 
information to a base location.This algorithm is similar to a technique that was independently invented in a 
very different context, i.e. frequency standards & spectroscopy [Freq96] as Artur Ekert pointed out. 

(ii) Sections 3 & 4 consider the problem where the number of processors is only a fraction of the number of pieces 
of data. For this, each processor needs to carry out several computations before the results of all the processors 
can be combined. There are thus two parts: 

(a) First is a serial o( — — ] step quantum mechanical algorithm for estimating the mean of N numbers in 

the range [-1, 1] (any classical algorithm will take at least Cl( — !— ] steps). (Sections 3.3 & 3.4). 

K \z\ J 

(b) Next, it is shown that with r) appropriately initialized quantum processors at remote locations, it is pos- 
sible to carry out the overall computation <9(r|) times faster. Each processor operates independently; at 
the end it transmits just one bit of classical information to the base location. (Sections 4.0, 4.1 & 4.2). 
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2.1 First Problem Consider the following problem: there are N = 2 n states So,Sj,..S n .j each of which 
has an associated scalar value Vj in the range [-1, 1] . Assume that there is a quantum processing element for 
each state (since each quantum processor is usually just a single particle, this is not extravagant in terms of the 

N-l 

1 _ 2 
hardware required). The mean jo. is defined by |i = — ^ v j ■ Let be known to be (9(0 ) (where is a small 

7 = 

2 

quantity). The object is to estimate this mean with a precision of 0(0 ) , i.e. the estimate \i e should be such that, 

2 

with a large probability, the actual mean of the N numbers lies in the range ± (9(0 ) . 

The problem when the order of the mean is unknown can be solved in a logarithmic number of repetitions of 
the algorithm used to solve the above problem. The order of the mean can be estimated by initially assuming the 
mean ([i) to be bounded by a large quantity ( is large), estimating the mean, and keep scaling down in each itera- 

tion sequence until |i e becomes non-trivial. For example, assume (a to be 2 x 10 . Initially would be set to a large 

2 2 -2 

value, say 0.5, and the mean estimated with a precision of (9(0 ) , say 0.10 which is 2.5 x 10 ; the result will 

2 -2 

therefore be somewhere in the range ±0. 10 , i.e. ±2.5 X 10 . Next is lowered by a constant factor (say 1.5) and 

2 

the procedure repeated. The result will again be in the range ±0.10 (with the reduced value of ). would be 

2 

repeatedly lowered until the absolute value of the result becomes greater than 0.10 . This happens after 18 repeti- 

-4 

tions, when has fallen to 3.35 X 10 .A similar procedure is also described in [Median96]. 

2.2 Algorithm The following steps estimate |i with a precision of (9(0 ) (0 was just defined above in section 

2 y, v ; 

2. 1, by the statement that [l be known to be (9(0 ) ). Angular brackets represent averages, e.g. < v) = ^ . 

Consider the following quantum state vector of N particles: (|0) Q |0) 1 ...|0)^_ j + ^...\l) N _ j), this is 

referred to as an N particle EPR system (as mentioned earlier, this is called a cat state by analogy with Schrodinger's 
cat, i.e a system of many particles that is in two overall states). The state has to be prepared when the N particles are 
together. However once the state is prepared, they can be separated and taken to distant locations. The phase of the 

v ■ 

|0) state of the /* particle is rotated by — } — . It follows that the state vector of the system after this operation 

NQ 

becomes: ^|0)q|0) 1 ...10)^ _ jexp^z'-^j + |1)q|1)j ■■• _ ij ■ Note that the particles are still physically separated. 

Next apply the operation M, as described in section 1.1, to each of the (N-l) particles: (1, 2...N - 1) . The state 

vector becomes proportional to: (|0> (|0> + ...(|0> + |1))^_ ^xp^j + |1> (|0> - ...(|0> - |l» iv „ l 



th 

Now if the state of each of the (N-l) particles: (1,2...(7Y-1)) is measured, it follows that the state of the par- 
ticle becomes ^O^exp^z'— ^ j±|l) j where the sign (plus or minus) depends on the results of the (N-l) measure- 


ments. In case an even number of the (N-l) measurements be l's, the sign is positive, if an odd number be l's, the 
sign is negative. Therefore if the results of all these measurements be transmitted to the location of the th particle, 
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the state of the th particle becomes precisely known i.e. we have a single binary state particle where the phase of 
one of the states is rotated by a constant amount proportional to <v) . 

A constant phase shift can be estimated by an interference type experiment. For example consider the matrix 



M= 1 



1 1 
1 -1 



72 

state vector is proportional to 



as discussed in section 1.1. When M acts on a state vector proportional to 



1 

exp(;0) 



the resulting 







(cosf) 2 : 


(sinf) 







1 + exp(/0) rj^ e rat j Q o j ^ p ro b a biii t i es i n me two state 
1 - exp(;0) 

Therefore if the state of the system now be measured, it is possible to estimate the phase . By repeating this exper- 
iment a times, the phase can be estimated with a precision of -J= . 

Ja 

The above applies both to the case where the data is localized and also when it is distributed. In the former case 
it gives a parallel algorithm for adding N numbers by doing N simultaneous unitary transforms on N particles and 
then calculating the XOR of Ambits. Alternatively, in the more interesting case, if the data is distributed, the benefit is 
that instead of transmitting all of the data to the base location, we have to transmit just one bit of classical data from 
each location to the base location. Since the XOR is commutative and associative, it follows that in case some of the 
data were located together, then instead of transmitting one bit for each datapoint, we could XOR the local results and 
just transmit the result of the local XOR. 

3.1 Second Problem Consider the problem discussed in section 2.1, with the difference that the number of 
processors, r| , is only a small fraction of the number of datapoints N (previously the number of processors was equal 

to the number of datapoints). Sections 3.3 & 3.4 give an ^f^J ste P serial algorithm using just one quantum proces- 

9 

2 

sor. This corresponds to the situation discussed in section 0.1 with = e ; i.e. in the terminology of section 0.1, the 
complexity is o( — \r-z ] . Sections 4.0, 4.1 & 4.2 consider the multiprocessor case. 

3.2 Outline Given a particle in a state |0) , the following steps (described in section 3.3) rotate the phase to pro- 

3 3 
duce the state |0)exp(;'(|)) . Here (j) is proportional to (v) + 0(Max(y ■ )). The second term, i.e. 0(Max(v ■ )) is an 

error term. In order to reduce its relative contribution, scale the values of each state by and define x ■ = Qvj . Hence 

3 

each Xj is 0(0) and (x) is (9(0 ) and so the relative value of the error gets reduced. The following steps estimate 

3 2 
(x) with a precision of (9(0 ) , thus leading to an estimate of the mean, (A, with a precision of (9(0 ) ((i) & (iii) are 

the same steps as already carried out in section 2.2) 

(i) Starting from (|0) + |1» , the system is put into the state (|0)exp(;'(|)) + |1)) , where is proportional to (x) . 

(ii) This relative phase, , can be measured by making it apparent in the relative probabilities of two states in an 
interference type experiment and doing a statistical sampling of several such systems. 

(iii) This idea can be extended to a "cat state" (an r| particle EPR system with the r| particles at remote locations) 
where all the phase rotation occurs in the same term and hence adds. 

3.3 Algorithm This section discusses the serial algorithm. Given a particle in a state |0) , the following steps 
rotate the phase to produce the state |0)exp(;'(|)) . Here is proportional to (x) and is of magnitude (9(1) (as men- 
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tioned earlier, (x) is (9(0 3 ) and so the proportionality constant relating (]) to (x) will have to be of-^J). This 

4 

result is achieved with a probability of (9(0 ) . 

The algorithm, though stated below for a single state |0) , can be easily extended to the case where the phase of 
only one of the many states of the system needs to be rotated, i.e. starting from |0)+|1), produce the state 
|0)exp (/(])) + |1) . This is achieved by carrying out the operations conditioned on the fact that the system is in state 
|0) . It also applies to the case where the system is arbitrarily coupled to the environment, i.e. starting from the state 
|0)|e> + , produce the state |0)|e)exp(/(^) + |l)|/> . 

(i) Starting from the |0) state (the state the phase of which needs to be rotated), apply the WH transform to initial- 
ize the system so that there is the same amplitude in each of the N = 2 H states (as described in section 1). 

Amplitude in every state: -J= 

JN 

(ii) In case the system is in the /* state, rotate the phase by jj where siny^. = x . (since x ■ is small, jj ~ Xj). 



Amplitude in the j th state: -J=( 7 1 - x 2 + ix . 



N 



J J 



(iii) Apply the WH Transform as discussed in section 1.1: W = 2 n/2 (-\ )P ' where p is the binary repre- 
sentation of p , and p ■ q is the bitwise dot product of the n bit strings p & q . 



Amplitude in the Q th state: ({Jl -x 2 ) + i{x) 



th 

(iv) In case the system is in the state, rotate the phase by K radians. 

(v) Apply the WH Transform. 



Amplitude in the f h state: -^=\ Jl - x /" - 2< *Jl - x") I + -^=(x ■- 2(x)) 

Jn\ n j J Jn j 



(vi) In case the system is in the /* state, rotate the phase by jj where siny^- = x ■ (since x ■ is small, y . ~ Xj). 



Amplitude in the f h state: 



(vii) Apply the WH Transform. 

th 4 3 

Amplitude in the state: (- 1 - 2i{x) + (0(9 ) + iO(d ))) 

th 

(viii) Make an observation that measures whether or not the system is in the state. With a probability of 

4 th 
(1 - (9(9 )), the system will indeed be in the state with an amplitude proportional to 

(- 1 - 2i (x) + ((9(0 3 ) + /<9(0 3 ))) , in case it is not in the th state, stop and start from the beginning. 

The net effect of the algorithm is to rotate the phase of the amplitude of the |0) state by approximately 

3 

2{x) ± (9(0 ) . If this procedure be repeated r times, the phase gets rotated by approximately 2r{x) . Since (x) is 
(9(0 3 ) , it follows that if the number of repetitions, r, be chosen to be ^--3 j , the phase gets rotated by a finite 

amount proportional to (x) . As described earlier, in section 2.2, a finite phase shift can be estimated by interacting 
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with another state in the reference phase - note that this will require o( ] repetitions of the sequence (i),...(viii) in 

the algorithm since, as just mentioned, it requires o(^—^ J repetitions of the sequence (i)(ii)...(viii) to rotate the phase 

9 

by O(l). Since a is (9(1) , the total number of steps required will thus be <9(— 5 ] which is o( — . In the mea- 

V ; V |e| ; 

th 4 ( 1 

surement of step (viii), the probability of the system not being in the state is (9(0 ) , it follows that in 0\ — 

V 

repetitions of the procedure there will only be an 0(0) probability of observing the system in a state other than the 
th state. 

3 (x) 

Hence (x) can be estimated with a precision of (9(0 ) ; (0. , which is — , is thus estimated with a precision of 



2 

(9(0 ) in accordance with the problem specification of section 2. 1 . 

3.4 HOW does the algorithm WOrk? The iteration sequence (i), (ii). ..(viii) in section 3.3, rotates the 
phase of the state |0) by an amount proportional to (x) . In order to see this, observe: 



(ii) Denote the amplitude after step (ii) in state j by cij, \.e,.a- = -j=(Jl - x^ + ixj) . The state vector after step 



(ii) is thus (« , ap _ j) . 

(iii) Denote the amplitude in state j, after the WH transform of step (iii), by Wj. The state vector after step (iii) is 
thus (wq, w^...w n _ . It follows by the definition of the WH transform in section 1.1, that 

N- 1 

Wq = -j= ^ a- , which by using the value for aj from (ii) above, becomes Wq = ((Jl -x 2 ) + ;(x)j . 
j = 

th 

As can be seen by the expression for , the phase after (iii) in the state is proportional to (x) . However, if we 

th 

measure whether or not the system is in the state, there will be a relatively high probability of it collapsing to a 
non-zero state. The following steps (iv)..(viii) serve to reduce this probability. 

th 

(iv) By inverting the phase of the state (in step (iv)), the state vector becomes: (-Wq, w j . . . _ j ) • 

(v) The state vector after step (iv) can be written as: 

(-Wq, Wj...w jv _ j) = (-2wq, 0, 0...0) + (wq, w^...w N _ j) . Since the WH transform is its own inverse, 

it follows that the WH transform of (wq, w^...w N _ j) is (a Q , a-y, a N _ j) . By the definition of the WH 



f 2\Vq 2wq ^Wq^ 



transform in section 1.1, the WH transform of(-2w n , 0, 0...0) is , , ... 

the value of Wq from (iii), it follows that the total amplitude in state j after step (v) is: 



substituting 



U£^-2(J^ 2 )) + -j=( Xr 2( X )). 



(vi) The amplitude in state j, after step (v), as derived above, can be written as: 
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— !=f- »/l - x + /x I - ^iljz + n - x 2 - (Jl - x^) \ . The first two terms are the dominant ones, 

the last term (in parentheses) contributes the error terms. When the phase is rotated by x . , it follows that 

3 

since each jc . is (9(0) & (x) is (9(0 ), the above becomes: 
- -L -f^^(l-<9(0 2 )) + -^<9(0") 1 + 



- , ^ v. „■ ... r - .l-x 2 -(Jl^ 2 )) + (-^0(e 4 ) + ^0(Q 3 ) 

This may be written as: f- - + ~1=[ S^-*} ~ < Vl-*Y) + f ^O(0 4 ) + -|=<9(0 3 ) 



th 

(vii) When the WH transform is taken, as mentioned in (iii) above, the amplitude in the state is 



Wp. = -j= V a • . The contribution from the term —=( Jl - x 2 - - x 2 ) | is hence zero. Therefore 

^j = ^ 

f/j 43 
the amplitude in the state is (- 1 - 2i(x) + ((9(0 ) + iO(B ))) . Therefore by precisely estimating the 

3 

phase of the state as described at the end of section 2.2, (x) can be estimated with a precision of (9(0 ) . 

th 

(viii) The probability of the system not being in the state is obtained from the expression for the amplitude in 
(vi). It is easily deduced that the contribution of the termf ]= - iilf)^ to an y state ^ j ^ q j s zero gy 

4 

using the fact that each x . is 0(0) , it follows that the probability due to the rest of the terms is (9(0 ) . 

4.0 Parallel Computing s ection 2.2 discussed a simple case of quantum mechanical parallel computing 
where each processor had just to carry out a single operation. In general each processor has to carry out a finite num- 
ber of operations and then share its result(s) with other processors. A problem with quantum mechanical systems is 
that it is not possible to make a copy of a quantum state. 

The main issue in parallel computing, whether classical or quantum mechanical, is regarding how to partition the 
problem among different processors. If the problem of estimating the mean was being solved classically, the parti- 

1 

e' 



tioning would be obvious, i.e. in case 0\ — J samples have to be generated in all, have each of the r| processors gen- 



erate o( — - j samples. This immediately gives a speed-up by (9(r|) . This procedure does not work with the 

quantum mechanical algorithm of section 3.3 because each of the processors only calculates the mean with a finite 
precision. Whereas the classical processors can calculate the mean of a certain fraction of randomly chosen samples 
exactly, the quantum mechanical algorithm of section 3.3 calculates the mean of all of the datapoints, but only with a 

certain precision. For example, if we allow each of the r) processors o( — y-^ J steps, and then classically calculate 

T|E ' ' 

the mean of the r) values, it is easily shown that the final result is only precise to (9(r| ' 166 "e) . If the final result has 



to be precise to (9(e) , each of the processors has to go through <9( - ^ — ] steps, therefore this only gives a 



r| e 



speed-up of <9(r|°' 75 ) . 



Another possibility is to take the r) quantum mechanical binary state systems obtained after the algorithm of sec- 
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tion 3.3 and carry out suitable unitary & projection operations to try to get a single binary state system whose phase is 
rotated by a larger amount. The way such an approach might work is the following. Consider two independent quan- 
tum mechanical systems which have been through the procedure described in section 3.3. The phase of state |0) of 
both systems is rotated by § . The composite system is thus (|0)j exp( j§) + |l) 1 )(|0) 2 exp( jty) + |1) 2 ) . This may be 

written as: (|00)exp(2j(|)) + |01>exp(;<j)) + |10)exp( jty) + |11» . If this is passed through a CONTROLLED-NOT 
gate with the first bit as the control, the new state becomes: (|00)exp(2j(|)) + |01)exp( jty) + |1 l)exp( j§) + |10)) . If 
the second bit is now measured, then with a 0.5 probability it will be a 0, in which case the first bit will be in the state 
(|0)exp(2 j§) + |1)) , i.e. the angle by which the second state is rotated has got doubled. Therefore if we have r| such 
quantum mechanical systems with the phase of the second state of each of them rotated by (]) , then after repeating the 

above process on ^ pairs, we obtain approximately ^ binary state systems where the phase of the second state is 

rotated by 2(|) . Assuming r| to be a power of 2, it follows that after (9(logr|) iterations, we obtain O(l) systems 

where the phase of the second state is rotated by 0(jr\ty) . This process thus gives a speed up of only 0(*Jr\) . Sec- 
tion 4.2 describes a parallel algorithm with speed up of 0(r|) . 

4.1 Distributed Computing Distributed computing is the case of parallel computing where the proces- 
sors are remotely distributed and there is a cost associated with communication between them. We assume that all 
processors have access to either all the data or a representative portion of the data - this would be the case if the data 
were being computed by a simply specified rule. The algorithm easily extends to the case where different processors 
have access to different portions of the data, but we postpone discussing that to another occasion. 

The classical solution to this problem is immediate if each of the processors is able to communicate the 
result of its computation to a central location. It is then the same as the one discussed at the beginning of the previous 



section, i.e. have each processor generate Ol — - J samples. 

T|e ' 

The obvious quantum mechanical solutions to this are the same as the two discussed in section 4.0. In the 
first suggestion, each processor has to communicate the full result of its computation to the central location, the speed 
0.75 

up is 0(r[ ) . In the second algorithm (where multiple systems were combined to give another system with a larger 

phase rotation), the speed up is only <9(r|^ ' ) ; however it is now possible to use quantum teleportation [Teleport93] 
to transfer the processed quantum system to the central location using just two bits of classical information instead of 
the full result of the computation. This leads us into the final algorithm where the speed up is 0(r|) , also each proces- 
sor has just to transmit one bit of classical information to a central location. 

4.2 Distributed implementation of algorithm of section 3.3 Assume that ail processors 

have access to either all the data or a representative portion of the data. As mentioned before in section 3.3, since the 
result of the algorithm does not depend on how the state of the particle being rotated is coupled to other particles, the 
algorithm is easily extended to the case where the phase of only one of the many states of the system needs to be 
rotated, and the system is arbitrarily coupled to the environment, i.e. starting from the state |0)|e) + |1)|/) , produce 
the state |0>|e>exp(j<|>) + |l)|/> . 

As in section 2.2, consider the following quantum state of r| particles: 
(|0)q|0) 1 ...|0) T | j + |1)q|1)j...|1) j) .The algorithm of section 3.3 can be applied to each of the r| particles inde- 
pendently, to rotate the phase of the |0) state of each particle by (]) . The state of the system after this operation 
becomes: (|0)q|0)j...|0) _ 1 exp(;'r|(|)) + |1)q|1)j...|1) _ j) . Note that the particles are still physically separated. 

Next apply the operation M (as described in section 1.1) to each of the (rj — 1 ) particles: 1, 2 . . . T| — 1 '. The state now 
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becomes proportional to: (|0) (|0) + ...(|0> + 11))^ _ jexpO'T^)) + |1) (|0) - (|0> - 11))^ _ x ) . Now if 

th 

the state of the (r| - 1) particles: (1, 2...T] - 1) are measured, it is easily seen that the state of the particle 
becomes (|0) exp(;r|(|))±|l) ) where the sign (plus or minus) depends on the results of the (rj — 1 ) measurements. 

In case an even number of the (r) - 1 ) measurements be l's, the sign is positive, if an odd number be l's, the sign is 

th 

negative Therefore if the results of all these measurements be transmitted to the location of the particle, the state 

of the th particle becomes precisely known and we have accomplished our objective, i.e. of using r) remotely 
located binary state particles, each of which have the phases of one of their states rotated by § , to obtain a single 
binary state particle in which the phase of one of the states is rotated by r)(|) . 

The additional constraint in working with an r| particle system is that the total probability of obtaining a 

non-zero result in step (viii) of the algorithm be negligible for all r| particles, i.e. r) x (9(0 4 ) x j is O(l). 

Thereforer) should be less than oQ j . The number of processors has to be less than oQ j or equivalently 
o( — -ttz) > this gives a bound on the speedup obtainable. 

5. Final thoughts DNA computing is an exciting line of recent research. The advantage with this is that the 
processors (the DNA molecules) are of microscopic sizes and thus offer great possibilities for parallelism. Quantum 
mechanical effects also only occur on a microscopic scale. In case it is possible to design parallel quantum mechani- 
cal algorithms so that each processor follows identical rules, it presents the same possibility for large scale parallel- 
ism, this is in addition to the inbuilt parallelism of quantum mechanical effects that existing quantum mechanical 
algorithms use [Factor94], [Search96], [Median96]. This paper gives such an algorithm for the problem of estimating 
the mean where the speed-up using r| processors is 0(r\) . 

NMR computing [NMR97a] [NMR97b] are recent implementations of parallel quantum computing. As pres- 
ently implemented, it does not assume coherence between different molecules. It is equivalent to classically combin- 
ing the results from various quantum computers - to use another analogy it is like incoherently combining the 
radiation from sources that are intrinsically coherent (e.g. a light bulb as opposed to a laser). 
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